Kokkos support for Lagrange and Monomial elements #4441
Kokkos support for Lagrange and Monomial elements #4441rochi00 wants to merge 9 commits intolibMesh:develfrom
Conversation
rochi00
commented
Apr 22, 2026
- Shape functions — nativeShape, nativeGradShape for LAGRANGE (12 types) + MONOMIAL (orders 0-5)
- Type dispatch — FEShapeKey, classFromTopology, nDofs, templated nativeMapShape
- Quadrature — GaussQuadrature device-callable for EDGE, QUAD, HEX, TRI, TET + fillQuadrature host helper
- Physical map — physicalPoint, jacobian, physicalPointAndJacobian, faceJacobian, faceJxW, faceNormal (templated + runtime versions)
- Device-safe asserts — printf + Kokkos::abort() for debug builds on GPU
27ce29f to
dcdc0df
Compare
- Add include/libmesh/kokkos/ with device-callable FE math headers:
scalar_types.h, fe_types.h, fe_base.h, fe_evaluator.h,
fe_lagrange_{1,2,3}d.h, fe_monomial.h, fe_face_map.h
- Add src/kokkos/fe_types.C with non-inline FEElemTopology/FEFamily
conversion functions and nDofs overloads
- Add --with-kokkos=DIR configure option (m4/libmesh_optional_packages.m4)
that defines LIBMESH_HAVE_KOKKOS and LIBMESH_ENABLE_KOKKOS conditional
- Conditionally compile src/kokkos/fe_types.C into libmesh (Makefile.am)
- Register HAVE_KOKKOS in include/libmesh_config.h.in template
All FE math types live in namespace libMesh::Kokkos.
- Remove tests for deleted functions: toKokkosTopology, toKokkosFamily, nDofs(FEFamily, FEElemTopology), and FEElemTopology enum contiguity - Update FEElemTopology::* -> libMesh::ElemType values throughout - Update FEFamily::* -> libMesh::FEFamily values in FEShapeKey construction - Update FEElemTopology topo field -> libMesh::ElemType topo in test struct - Replace nativeShape(ElemType,...) -> nativeMapShape(LAGRANGE_MAP,ElemType,...) since the ElemType overload was removed; nativeShape now takes FEShapeKey - Replace nativeGradShape(ElemType,...) -> nativeGradMapShape(LAGRANGE_MAP,...)
The PETSc-bundled Kokkos has broken half_impl_t support with certain CUDA toolkit versions. Define KOKKOS_HALF_T_IS_FLOAT before including Kokkos_Core.hpp to bypass the incompatible half-precision code paths. The tests do not use half-precision types.
GaussLegendre1D: 1-D Gauss-Legendre rules (1-7 points), device-callable. GaussQuadrature: Full topology dispatcher with n_points/point/weight methods for EDGE, QUAD, HEX (tensor product), TRI (Dunavant), and TET (Keast/Walkington). All methods are LIBMESH_DEVICE_INLINE. fillQuadrature: Host-side convenience wrapper using std::vector.
…8fb6b91) with merged Kokkos build fix (PR libMesh#66)
b160299 to
1dc4dd1
Compare
| // License along with this library; if not, write to the Free Software | ||
| // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA | ||
|
|
||
| #pragma once |
There was a problem hiding this comment.
In practice I think literally every compiler we care about supports this (though Wiki lists one I've never heard of that doesn't!), but since it never made it into a C++ standard we've always used include guards instead; let's do that here too.
| #define libmesh_noexcept noexcept | ||
|
|
||
| #if defined(__CUDA_ARCH__) || defined(__HIP_DEVICE_COMPILE__) | ||
| // GPU device code does not support C++ exceptions — silently no-op. |
There was a problem hiding this comment.
Mostly when we throw an exception it's because we can't sanely go on; even if we don't support exceptions on device I don't think silently going on is the right way to handle that. When we don't support exceptions on the host we print the error message and abort; should we not do that here too?
| namespace libMesh { | ||
|
|
||
| /** | ||
| * \enum libMesh::FEElemClass groups element types by topological class, |
There was a problem hiding this comment.
Concept is good, but I don't like the name. FEFamily really is FE Method specific; an Elem is an Elem even if we're doing Finite Volume or pure geometry/meshing work (or in the future FDM stencils or whatever) instead. Let's just call this ElemClass, unless someone has an idea for a better name?
| // | ||
| // Order is only meaningful for MONOMIAL specializations. | ||
| // Lagrange specializations always use Order = 0 (the default) because the | ||
| // ElemType (e.g. QUAD9) already encodes the polynomial order. |
There was a problem hiding this comment.
No it doesn't. p=1 Lagrange on a Quad9 makes perfect sense and is commonly used for the pressure variable in incompressible flow simulation.
I get that we're not supporting everything right out of the gate, but we need to make sure that we're not digging ourselves into any holes in our documentation or (more importantly, since it'll be much harder to change later) our APIs. Is there something in the design that would already prevent us from returning evaluations of a p=1 Lagrange on a higher-order geometric element?
| // ── On-device helpers: element class -> spatial dimension ───────────────────── | ||
|
|
||
| LIBMESH_DEVICE_INLINE unsigned int | ||
| dimFromClass(FEElemClass cls) |
There was a problem hiding this comment.
I'd like to use snake_case rather than camelCase, like we do everywhere else in libMesh code.
We should also try to get this to resemble existing code in less superficial ways if possible. We've got static const unsigned int Elem::type_to_dim_map[] already, for instance; could we either add an Elem::class_to_dim_map[] too, or if elem.h would cause Kokkos fits, would it make sense for us to still use the same array-based design to make things easier to merge later?
|
|
||
| // --------------------------------------------------------------------------- | ||
| // Free operators not covered by TypeVector/TypeTensor arithmetic | ||
| // (scalar±vector forms that don't exist in the base classes) |
There was a problem hiding this comment.
They don't, because that's kind of horrifying for type safety. Even if they did we'd probably decide that the proper projection of s from ℝ→ℝ³ was (s,0,0), not (s,s,s). That's not even of magnitude s!
Where do we use these? If the answer is "nowhere" let's delete them; if not let's give them new names so I can grep for those names and stare at the hits suspiciously.
| parent_pt(0) = pt(0); | ||
| parent_pt(1) = pt(1); | ||
| parent_pt(2) = pt(2); | ||
| return parent_pt; |
There was a problem hiding this comment.
Wait, what?
There are reference coordinates on a point element; they're all just (0,0,0).
But (x,y,z) is what point(0) will give you, and that is not (xi,eta,zeta). point(0) on a NodeElem will be a point in physical space.
| for (unsigned int k = 0; k < n; ++k) | ||
| { | ||
| const Real s = face_qpt(0); | ||
| const Real t = face_qpt(1); | ||
| const Real psi = nativeMapShape(mapping_type, side_topo, k, s, t, 0.0); | ||
|
|
||
| const auto & pt = side_in_parent.point(k); | ||
| parent_pt(0) += psi * pt(0); | ||
| parent_pt(1) += psi * pt(1); | ||
| parent_pt(2) += psi * pt(2); | ||
| } |
There was a problem hiding this comment.
Okay, it's midnight, and I'm getting tired of tearing my hair out, and I think I've already written enough to keep you going for a while, so I'm going to post this review half done, with one final issue:
Is this whole block not just wrong? It gives the location in physical space of face_qpt in side-reference-space, but the docs here say we're returning a point in parent-reference space. There has to be an inverse_map() afterward to get back to that.
Please tell me I'm wrong, and I just lose too many IQ points by midnight? We clearly need way more test coverage if I'm somehow our first line of defense for a bug this big. I know we don't have the environments set up to test in PR yet, but ideally you're running make check yourself and at some point in the new tests we're doing a side integral?
| [submodule "contrib/metaphysicl"] | ||
| path = contrib/metaphysicl | ||
| url = ../../libMesh/MetaPhysicL | ||
| url = ../../rochi00/MetaPhysicL |
| if LIBMESH_ENABLE_KOKKOS | ||
| endif |
There was a problem hiding this comment.
Actually, one more comment.
There's an interesting story about the rock band Van Halen:
When they were on tour, they had a fairly complicated and safety-critical set of contract stipulations that venues had to meet: heavy platforms and heavy equipment, serious electrical requirements, pyrotechnics, etc. And they had a little contract clause that demanded a bowl of M&Ms for them backstage, with all the brown M&Ms removed.
If someone challenged that clause, they waived it. They didn't care about not eating brown M&Ms. They just cared that the big long critical pile of contract text had actually been respected, entirely read by the people with the primary responsibility of reading it and making sure everything was set up correctly. If they hadn't been reading carefully enough to notice the brown-M&M-hating-divas bit, could they really be trusted to be getting the subtle, tricky mechanical and electrical and thermal stuff done and inspected safely? No; the band roadies would essentially have to reinspect everything to see which parts of it they'd have to redo.
It was not a good idea to be a venue that made the hard rock band and their annoyed roadies unable to trust any of your work.
There was a problem hiding this comment.
Sorry, I mostly paid attention to the tests that were written, and once they all passed assumed the internal code was correct. That was clearly not a good idea. I will review everything now. The tests clearly didn't have enough coverage
There was a problem hiding this comment.
The claude generated comments had so many issues, it is terrifying how much it overclaims
…evice-safe abort - Replace #pragma once with traditional include guards in all gpu/ headers - Rename all camelCase functions to snake_case (libMesh convention) - Refactor FEShapeKey to carry ElemType instead of FEElemClass - Add lagrange_shape_topology_for_key() matching libMesh dispatch pattern - LIBMESH_THROW on device: print + abort instead of silent no-op - Remove lossy FEElemClass indirection from dispatch path - Simplify Real3 to alias of RealVectorValue - Add comprehensive physics shape tests for exact Lagrange keys
Deduplicate Gauss quadrature point/weight tables between the CPU
quadrature sources (quadrature_gauss_{1,2,3}D.C) and the GPU
kokkos_quadrature.h header by extracting them into a shared
device-callable header. Also extends kokkos_fe_types.h with
additional n_dofs support and refines face map implementation.